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A METHOD OF UPDATING A DATA SOURCE FROM TRANSFORMED 
DATA 



Field of the invention 

The present invention relates to a data source updati ng method from 
transformed data. . 

• ■ 

Background of invention 

There are currently several types of database structures of which the most 
common examples are the flat file database structure, the relational database 
structure and the hierarchical database structure. 

IVIost database programmers are familiar with relational databases (e.g. 
MsSQL. Oracle), and flat file databases (e.g. such as a text file with delimiters). 

Data in text files or relational databases can be easily retrieved and updated. A 
line of data to be modified in a text file database is simply inserted, deleted or 
updated by direct user action In a text editor, from a simple script program or an 
advanced user Interface program. If the data in a relational database is to be 
modified, this can be easily done by an Insert, delete or update command in a 
query language such as SQL, which uses foreign keys and selection criteria to 
modify related tables. 

An example of a hierarchical database structure is XML. In an XML database 
information is arranged In such a manner that data *d rills down' Into branches of 
the database to retrieve the information required. This can be imagined as a 
tree with many levels of branches, and the data at the ends of the branches is 
classified according to the branches. In other words, XML source documents 
consist of data structured into 'trees'. 
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XML data is usually used by, for example, XSL processors to produce an output 
result. Typically, a process begins with a source tree (or source document), and 
ends \N\th a result tree (or result/transformed document, which may be any type 
of marl<ed up document or simply a text document). Central to a data tree is the 
concept of nodes, which are data objects categorised as 'branches' of the tree. 
There are several node types, and these are known to the man sicilled in the art 
A simplified example of the types will be given here using the Incomplete XML 
document shown below; 

<?xml version="1 .0"?> 

<?xml-stylesheet type="jnfor/xsl" href="link.xsl"?> 
<Eimt1><Elmt2>HeIlo Worid</Elmt2></Elmt> 

• • • 

• a • 

1. Root nodes: There can only be one root node because it represents the 
document Itself. The root node of the XML document is the whole of the 
document. 



2, Element nodes each represent an element and usually consist of a pair of 
the element tags. The root element in the example is <Elmt1></Elmt> 

The first child node of the example is another element, <Elmt2></Elmt2>, 
Sometimes, elements may not be defined by tag pairs but by a singular tag, 
such as <br> in HTML, which Is simply a line break. Element nodes usually 
contain textual information. 

* 

3. Text nodes consist of character data, or text or strings, e.g. 'Hello Worid'. 
Generally, text nodes are found encapsulated In element nodes. 
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4. Attribute nodes contains Information on attributes, sucli as text styles etc 
wliich are scripted inside element tags. 

5. Namespace nodes: Eacli element has a namespace node corresponding to 
5 its namespace prefix such as 'xsl:'. Such nodes ensure that different elements 

with the same name, or different attributes with the same name and in the same 
element, can be distinguished. Namespace nodes are optional In XML * 
documents and are not always be used. 

10 6, Processing instruction nodes: contains processing instructions. 

• a 

7. Comment nodes contains comments which are enclosed in 
<l— comments — >. 

15 As a further example of the nodes in an XML document, the line of code: 

<div> this is <b>bold</b> text </dIv> 
has five nodes, i.e. 

<dlv></div> (pair of tags fonnjng an element node) 

this is (a text node) 

20 <b></b> (format tags, also a node) 

bold (another text node) 

text (a third text node) 

A powerful feature of XML Is that It provides a way of data sharing between 
25 different data systems, regardless of how the data Is structured. The syntax 
structure of XML is similar to that of HTML, and the tags to Identify the purpose 
of each piece of data, except that users define the tags themselves, This 
provides versatility In data classification and identification. As a result. XML 
imparts inter-system transportability to data. 

30 

An XML document's tags are Interpreted, or the data therein selected and 
manipulated in a process called 'transfomiation' or 'query*, depending on the 
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type of XML processing script used. The output of the transformation may be 
data encoded in HTML syntax, or another XML document. The transformed 
XML document encodes selected data in tags defined by a recipient user so 
that the data may now be recognised by the recipient user's system. Therefore. 
XML does not have the conventional difficulty of data migration between 
databases with different table structures as with the relational database 
structure. Examples of such transfonmation/query technologies are XSLT 
(Extensible Stylesheet Language Transfomnation) or XQuery. The use of 
XQuery and XSLT Is well known to a man skilled in the art. 

The present limitation of XML is that there Is no standard way by which 
modifications may be made to XML data through a script language, unlike in a 
relational database (e.g. SQL), XML updates have to be made through a text 
editor, or if the XML document is generated from a relational database, the 
document has to be regenerated after the information has been updated In the 
. relational database. 

Referring to the example below, if an XML document user wants to correct the 
title "Mr Chew's Life " to "Mr Chew's Other Life", he has to look for the element 
node, <bio></bio>, containing the text node "Mr Chew's Life and then type 
over the book title in a text editor to change it (the book titles enclosed in the 
element tags are nodes In their own right, known as text notes). 

Furthermore, If the modification of XML data is to be initiated on a transformed 
XML document, in such way that a corresponding update in the source 
document is effected as well, the updating action is then no longer just a 
straightfonrt/ard replacement of data In through a text editor At least two 
documents will be involved now, the transformed document and the original 
source document. Modifications to the data in the transformed document must 
be properly reflected in the correct source document(s). 
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The update action becomes particularly difficult If it involves deletion or insertion 
of element nodes such that the structure of the transformed XML document 
changes and the software engine for updating the source nodes relies on 
min-ored node positions between the source and transformed document to linl< 
5 them. 

Cun^ent technologies, such as XPath, XPointer and XQuery are known 
technologies which define node positions in an XML document by the nodes' 
sequential order or by indexing the nodes. However such node positions are not 
10 persistent through editing actions. 

A transformation/query performed on an XML document is analogous to an SQL 
query. However, re-arranging, inserting and deleting data in an XML source 
document through actions on a transformed document is presently impossible, 
15 especially if the XML structure is modified. Node indices, i.e. ordering of node 
• positions, are not persistent when sibling elements are inserted, removed or re- 
arranged, making It impossible to maintain proper links between a source XML 
. document and the transformed XML. document after multiple transformations or 
•queries. Due to these limitations, it is difficult to manage an XML document like 
20 an SQL database. 

US patent application 20030037303 proposes a mechanism for reversing XML 
transformations, which allows updating of data In an XML source document by 
actions through a user interface displaying the transformed document Refen-ing 

25 to Fig. 1, the US . application proposes generating 16 an Inverse-XSLT sheet 15 
during a forward transformation by an XSLT style sheet 12 of an XML source 
document 11. When a viewer looks at the transfonned document, which may be 
in XML or displayed as an HTML web page 13, and decides to correct or update, 
a piece of data, he may do so on the display 13 itself. The updated display 14 

30 will be inverse-transformed by the Inverse-XSLT sheet 15 back into a source 
document 1 1 . In other words, a new source document is created which replaces 
the original one. This" mechanism can be used for a transformation process only 
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where there is a one-to-one mapping between tlie original document and the 
transformed document, as the inverse transformation script has an expectation 
of how the structure of the document to be inverse-transfomied should be lil<e. 

However, often transformations do not always consist of a single step process. 
In sophisticated XSLT and XQuery transfonnatlons, such as a multi-step 
transfonnation, the nodes in the resultant document may not have a one-to-one 
mapping mirroring the original nodes in the source document. 

For example, an XML source document having one parent element node, two 
child element nodes, each of which contains one text node, such as that shown 
in Fig 2, may be filtered in a transformation to extract only part of the data. The 
resultant extract is output in XML tags defined by another XML developer one 
parent node, one child element node and one text node. If the viewer changes 
'text1 ' in the transformed document when viewing It in a display, in order that 
the data is updated in the source document, the Inverse-transformation must be 
able to identify which of the original nodes In the data branch 'bookl' did 'textr 
comefiDm. 

Furthermore, jf instead of just processing one source document, several source 
documents are used to extract a resultant transfomied document, an inverse 
XSLT transformation script of the kind proposed in US patent appiication 
20030037303 has no way of detemriining from which of the original documents 
does a piece of data In the resultant document come and which source 
document therefore to update. 
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The present invention alms to provide a data source structure and a method of 
operating upon tliat structure to assist in updating of a database by operation on 
data transformed or extracted from tlie data source. 

An advantage of the described embodiment of the invention is that the method 
is not limited the l^ind of data transformation employed, the number of data 
sources used, the number of passes of data transformations or the types of 
presentation. The data transformation process can be complicated, as long as 
the source node identifiers are passed on after each transformation. 

According to the invention in a first aspect, a method of identifying data in a 
node-based data source is provided, comprising the steps of annotating each 
node with a unique identifier. 

According to the invention in a second aspect, a method of modifying a node^ " 
based data source is provided, comprising the steps of associating selected 
nodes in the data source with identifiers, identifying a node to be modified by 
reference to its identifier, and modifying the node data. 

According to the invention in a third aspect, a data source structured to operate 
as a node-based data source is provided, wherein at least one node is 
associated with a umque identifier. 

According to the invention in a fourth aspect, a method of annotating a 
transformed version of a data source is provided, comprising the steps of 
copying identifiers in the nodes In a data source to corresponding nodes In the 
transformed version of the data source. 

According to the invention in a fifth aspect, an Identifier which is capable of 
uniquely identifying a node in the data source and also a corresponding node In 
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a transformed version of the data source Is provided, whereby the node in the 
data source is mapped to the corresponding node in the transformed version of 
the data source. 

According to the invention In a sixth aspect, a data transformation engine is 
provided, comprising means of copying identifiers of nodes In the data source 
and Inserting the identifiers into the nodes of the transfomied version of the data 
source, whereby the nodes in the transformed version of the source data are 
mapped to their corresponding nodes In the source data. 

According to the invention in a seventh aspect, an industrial standard of node- 
based document modification Is provided. 

According to the invention in an eighth aspect, an industrial standard of 
Identification of nodes in a node-based document Is provided. 

According to the invention in a ninth aspect, an Industrial standard of node- 
based data transformation is provided. 

In general terms, the described embodiments of the invention provide a 
mechanism that allows a node-based data source, such as an XML document, 
to be updated from transfornned data. 

The mechanism comprises ways of preserving a link between a source 
document(s) and a transformed document, using annotations (or universal 
Identifiers) so that the source of a piece of transformed data is always known. 
The source location of each piece of data Is obtained by the Identifiers, even 
after multiple-passes of sophisticated transformation, and even when the 
resultant document is a combination of multiple source documents. In contrast 
to US patent application 20030037303, in order for an update action to be 
performed, there is no need to generate an Inverse transformation script every 
time the source document Is modified or transformed. Furthermore, the 
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mechanism allows update requests to be sent directly to the original data 
source by means of the identifiers, instead of through an inverse script. 

The embodiments described hereafter illustrate how node annotation allows 
5 data manipulation In an XML document, such that command procedures and 
routines may be designed to work on the documents In the same way as scripts 
are being used to update other types of database such as relational tables or a 
flat-ftle database. This imparts data interaction possibilities to XML engines, 
parsers, editing agents and displays, making XML a more user-friendly form of 
10 database structure. 

Brief description of the figures 

Embodiments of the invention will now described, by way of example, with 
15 reference to the accompanying drawings, in which: 

Fig 1 is a flowchart Illustrating the process of the prior art found in US 
application 20030037303. 

20 Fig 2 shows a prior art example of a transformation based on two source XML 
documents. 

i 

Fig 3 is a flowchart illustrating a method of XML source document updating 
according to an embodiment of the invention. 

25 

Fig 4 is a flowchart further illustrating the method of XML source document 
updating of Fig. 3. 

Fig 5 illustrates how the position of a new node may be decided on In an 
30 insertion or node creation action, giving an further example to the insertion 
method disclosed in Fig 4. 
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Fig 6 Illustrates how several source documents may be transformed but yet 
retained a traceable link to the original source documents according to the 
methods of Fig 3, 4 and 5. 

Fig 7 illustrates a method of modifying a transformation script by using the 
method disclosed in Fig 3 and 4. 
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Interactions between source document agent, transformation agent, 
editing agent and viewer. 

« 

Fig 3 is a flowchart showing an embodiment of the method of the invention. A 
source document agent 300 is provided which manages one or more source 
XIVIL document(s) 301, 302, 303. The XML documents 301, 302, 303, each has 
a plurality of nodes associated with data. 

A transformation agent 310, on being triggered, runs one or more 
transformation scripts 31 1 , which may be scripted In XSLT, XQuery or any other 
XML transformation language . A transformation script 31 1 determines how an 
XML document is transformed into another forniat. The transfonnation may 
include data selection and extraction, or combination of data from several XML 
documents, in ways analogous to queries performed on the tables of a 
relational database. 

* 

An editing agent 320 receives transformed documents 321 produced by the 
transformation agent 310. and displays them (e.g. in a browser if the 
transformed documents are HTML documents). 

When a transformation of the XML document(s) 301, 302, 303 is triggered, the 
source document agent 300 firstly annotates each node, i.e. the element tags, 
the text nodes and so on. In the source XML document(s) with a unique 
identifier. The transfonnation agent 310 then receives (at 330) the annotated 
XML document(s) 301 , 302, 303 to perform the transfonnation/query according 
to the instructions in the transformation/query script 311, producing a 
transformed document having data selected from the source document(s) 301 , 
302, 303. Whether the transformed document is of a different document format, 
such as HTML, or is also an XML document, the tags defining the elements 
nodes in the transformed document can be completely different from those in 
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the original XML source documents. Nevertheless, when the transformation 
agent 310 encloses the extracted text Infornnation In tag pairs defined for the 
transformed document 321 , the transformation agent 310 also transfers into the 
tag pairs the same identifiers which identified the data In the source nodes in 
5 the source documents 301 , 302, 303. Similarly, the text Is also extracted and 
produced in the transformed document as text nodes, which are also annotated 
with the same identifiers of their respective source nodes. 

The transformed document 321 is then sent (at 331} to the editing agent 320, 
10 where the transformed document 321 is displayed (at 332) to a viewer 37 of the 
document. The editing agent 320 may be an user interface, or Is embedded in 
an user interface, which allows the viewer 37 to interapt with the displayed data, 
such as to perform data updating (data 'updating' shall hereafter in this 
description understood to be inclusive of data replacement, insertion, deletion, 
15 copying and so on). 

Any modification on the displayed data by the viewer may now be mirrored in 
the source XML document(s) 301 , 302, 303 by means of the Identifiers. 

20 An update action (at 341 ) initiated by the viewer causes the editing agent to 
send an update request (at 340) directly to the source document agent 300, 
which performs updating on the source XML document(s) 301 , 302, 303.. The 
update request (at 340) is sent along with the identifiers of the nodes on which 
the viewer 37 has acted. 

25 

The source document agent 300, on receiving the update request, has to 
decide if the update request is allowable, for example, by constraints of security 
or the schema of the source document, or by applying rules implemented by the 
programmer, if the source document agent 300 accepts the request, the source 
30 document agent 300 uses the node identifiers and data values sent over with 
the request to locate and update the correct source nodes In the source 
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documentCs) 301 , 302, 303, be it text nodes, attribute nodes or element nodes 
fomned by tags or tag pairs. 

After the source document agent 300 has completed the updating, the updated 
5 XML source documents 301 , 302. 303 are sent again 330 to the transformation 
agent 310 to have the same transformation script 31 1 re-run on the updated 
source document(s) 301, 302, 303. The newly transfomied document 321 is 
then sent to the edit agent 320 so that the display is refreshed, reflecting the 
data update. The process repeats as long as the viewer keeps updating the 
10 data. 

Optionally, the re-transformation on the updated source document(s) 301, 302, 
303 may be performed partially, by just re-transforming updated or new nodes. 
Before re-transformation, nodes which already have been annotated are not 
15 annotated again. This ensures that the identifiers persist throughout many 
transformations. However, new nodes which are Inserted into the source 
documents for the first time are annotated with unique identifiers before re- 
transformation. 



20 If the update request (at 340) sent to the document agent 300 Is not deemed 
acceptable by the source document agent 300, the source document agent 300 
sends a rejection response (at 360) to the editing agent 320. The display will 
then show a rejection message (at 361 ) to the viewer 37, preferably also 
showing the reason for the rejection of the update request. 

■ 

25 

As mentioned above, the transformation of the source XML document may, 
instead of a single step transformation, comprises multiple transformations. 
These may be implemented by executing a number of XSLT and/or XQuery 
scripts 311 continually, i.e. each XIVIL document transformed by one 
30 transformation script 31 1 Is immediately subjected to a further transformation by 
another transformation script 31 1, until the final transformed document 321 is 
sent to the editing agent 320. 
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Similar to the annotation mechanism described above, the first transformation in 
a multiple-transformation step triggers the source document agent 300 to 
annotate the source document(s) 301, 302, 303 with unique node identifiers, 
which will be inherited by the first transformed document In transformations 
subsequent to the first one, identifiers of the nodes In each earlier transform 
document are in tum Inherited by the nodes In the next transformed document. 
Thus, the nodes in the final multi-transformed document 321 are traceable to 
the source nodes in the original XML source document(s) 301, 302, 303. 

The source document agent 300, therefore, on receiving update requests from 
the editing agent 320, can carry out updating of the con-ect nodes in the source 
document(s) 301, 302, 303, even In response to user action on an extensively 
multi-transformed document In other words, the links between the nodes of the 
source XML document(s) 301 . 302. 303 and those of the transfonned document 
321 are not lost, even if the transformed document 321 is the result of many 
transformation passes and may have a totally different structure from that of the 
original source document(s) 301 302 303, Furthermore, even if there are more 
than one source document, the nodes can still maintain persistent linl^ to the 
different source documents 301, 302, 303. as the identifiers are unique to each 
and every source document and node. 

Further example on the effect of updating actions on XML document 
nodes 

Fig 4 is another flowchart, which further illustrates the above-described 
mechanism of Fig 3, and also shows how, in addition to modification of e>dstlng 
nodes, new nodes can be added. The example of Fig 4 uses only text nodes 
for the sake of simplicity, but It is equlvalently representative of other process- 
able XML node types, such as element nodes which are fomied by singular 
tags or tag pairs containing text nodes. Note that in Fig 4, an original unmodified 
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source document 40a and the same source document, but modified 40b, are 
shown, and both are also represented by the label *40* in the figure. 

A source XML document 40a has three pieces of data as text nodes, A, B and 
5 C, When a transformation is triggered, the nodes are firstly annotated with node 
identifiers, respectively ID1, ID2 and ID3. The nodes are then subjected to 
multiple transformations 41 , 42, 43 defined in XSLT scripts (alternatively, 
XQuery or other kind of transformation scripts may be used, or the steps 41 , 42, 
43 may each use a different type of transformation script). 

10 

The transformed document 44 may be another XML document or an HTML 
document, which is displayed to a viewer having read/write access. The 
transformations/queries 41 , 42, 43 select and extract into the transformed 
document 44 only the data A and B, which become text nodes 441, 442 in the 
15 transformed document . The node 403 of data C having an Identifier 'ID3' is not 
selected by the transformation script because, for example, of selection criteria. 
The Identifier 'ID1' for the first text node 401 which Is data A, Is also copied tp 
the transformed document 44 as the identifier of the corresponding text node 

441 containing data A. Similarly 'ID2' is copied to the transformed document 44 
20 as the Identifier of text node 442 of data B. ' 

A viewer (human operator) then modifies node 441 data A to A', deletes node 

442 data B at and inserts new data D (which creates a new un-annotated node 
443) In the transformed document 44. 

25 

When an update command 451 is sent to the source document agent, the 
identifier 'ID1' of the text node 441, now A', is also sent to the source document 
agent with the new data A', so that the source document agent can update the 
correct node 401 having the identifier 'ID1' to be A' instead of A. 

30 

Acting on the transformed document, the user deletes the text node 442 of data 
B, identified by 'ID2'. When the editing agent sends the deletion request 452 to 
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the source document agent, the source document agent deletes the 
con-espondlng source node 402 In the source document 40 identified by the 
identifier '1D2', effectively removing data B from the source document 40b. 

5 As the nodes are identified by the identifiers, the deletion of the nodes 402, 442 
containing B, identified by 'ID2', though changing the XIVIL tree structures In the 
transformed document 44 and the source document 40* does not change the 
identifier 'ID1 ' of the node of A'. Any source node in the source document is, 
therefore, still 'linl<ed' to the corresponding node In the transfonned document 
10 by its identifier. In this manner, the updating and modification of data in the 

source document is effective, independent of inverse transformation scripts, can 
be effected by actions performed on the transformed document and is 
regardless of any difference between the structures of the source and 
transformed document or any changes in the structures. 

15 

It is Important to note that If the text node inside an element node is deleted, the 
element node, which is usually formed by an element tag pair, and the element 
node identifiers remain in the XML document, albeit enclosing no text node. 
However, when an element node (i.e. the tags defining the element node) Is 
20 deleted, the entire element node, Its Identifiers, any text nodes inside the 

t 

element node, and any child node(s) in the element node are all removed from 
the XML document, and the XML document structure is changed. 

When the data D 443 is sent to the source document agent for insertion, the 
25 Insertion command cannot be sent with an identifier since D is newly 

created/inserted in the transformed document and does not have an identifier. 
Different strategies can be employed to decide where this new text node is 
placed in the source document 40. 

30 As an illustration of the many possible insertion strategies, the editing agent 
may be programmed to reference, in the transformed document 44, the sibling 
or neighbouring node next to which the new node is to be created. For example, 
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the text node 441 of data A having an Identifier 'ID1' may be referenced (the 
node 442 containing B having identifier 'ID2' Is not used as it has been deleted 
earlier). The editing agent then sends a node insertion/creation request to the 
source docunnent agent along with the identifier of the sibling node, 101*. 

When the update command Is sent to the source document agent, the source 
document agent looks for a node having the identifier 'ID1 ' in the source 
.document 40 to position/Insert the new source node 404 with reference to node 
WV. Spedfically, whether creating the new node means creating it in a position 
above or below the reference node 401 in source XML document is left to the 
developer designing the insertion mechanism to specify. The example of the 

« 

strategy given here for text nodes is applicable for other process-able nodes, 
such as element nodes. 

Another insertion mechanism strategy is one in which the parent node of the 
new node to be inserted is identified, instead of the sibling node. The source 
document agent will simply insert a new child node under the corresponding 
parent node in the source document, next to all the existing child nodes. 

Referring to Table 1 as an example, assuming that the XIVIL document is a 
transformed document, if a viewer wants to add a new book The Life Of 
Claude' in the biography section, the editing agent must allow the viewer to 
insert The Life Of Claude' into the transfomied document (by generating the 
suitable element tags <bio></blo> to fomi an element node enclosing the title 
which Is a text node). When sending the update request, the editing agent 
sends, inter alia, the identifier of the parent node <bIography></biography> (the 
node identifiers are not indicated in Diagram 2, but It Is assumed here that the 
nodes are all annotated for this Illustration) to the source document agent, 
instead of the Identifiers any of the sibling element nodes. The source document 
agent is. In this case, programmed to create a new child element node under 
the identified parent node, and to insert the title as a text node into the new 
element node. 
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The purpose of these examples is not to specify all the updating mechanisms 
that can be devised, but to show that node identifiers can be used to facilitate 
the design of such mechanisms. Naturally, the node Insertion/creation 
5 command sent to the source document agent must specify enough detail in 

« 

order to let the source document agent knows exactly how and where to act, but 
other mechanisms will be apparent to one skilled In the art once aware of this 
disclosure. 

10 Referring now back to Fig 4, the modified source XML document 40b therefore 
has the node containing A' with the identifier '101', the new node containing D 
without an identifier and the node containing C with the identifier 'ID3' which 
was not extracted to the transformed document 44 ^nd has never been 
modified. 

15 

The updated XML source document 40b is then subjected to the same 
transformations 41 . 42. 43 before being displayed 44 with the updated data. 
Just before re-transformation, only the new node 404 will be subjected to • 
annotation. Existing nodes, which already have identifiers, are not re-annotated. 
20 This ensures persistency in the identifiers. 

The update/re-transformatlon cycle continues until the viewer is satisfied with 
the modifications. 

25 There is a possibility that a newly added node may not be reflected In the 
updated display of showing the re-transfomied document. This may be due to 
reasons such as the selection criteria of the transformation/query scripts 41 , 42, 
43. For example, if the transformation/query scripts 41 , 42, 43 include a criterion 
of selecting only text nodes having a string starting with letter 's', and if the text 

30 in the new node 404 starts with the letter T (or If the updating of an existing text 
node changes the text therein to one which starts with the letter T) then the new 
data would not be selected during re-transformation, and will not be sent to the 
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editing agent. In such situation, in order to assure the viewer that the node 
insertion/updating is successful, several strategies may be employed. For 
example, the editing agent may be programmed to be alerted whenever a new 
node has been inserted or an existing node is updated, and thus to 'looi< out' for 
5 the piece of new data when the display is refreshed. If the new information does 
not appear after re-transfonnation, the editing agent may send a confirmation 
request to the source document agent, which may then invoice a pop-up 
message to the viewer confimiing successful insertion/updating of the new data. 
Alternatively, the selection criteria of the transfomiation script 41 , 42. 43 may be 

10 by-passed for the inserted or updated node, such that the display 44 will show 
the new data regardless of the selection criteria in the transfomnation/query 
scripts 41 , 42, 43. A simpler alternative is to program the editing agent to 
always warn the viewer that a update may not be reflected In the display due to 
transformation script selection criteria, and that the viewer ought to take other 

15 measures to check that the data is inserted, such as running a 'select air script. 

If the transformed document is a combination of several source documents, the 
insertion of nodes will be even more complicated. For example, if the sibling 
nodes in the transformed document all come from different source documents, 

20 rules will have to be programmed into the system such that the editing agent 
knows which of those nodes (the neighbouring node on the left, right, top or 
bottom in the XML document) is to be considered a sibling node, and so send 
the correct identifier as a reference to the source document agent, so that the 
correct source document may be updated with the new data. Other selection 

25 criteria for the sibling node may be used, for example, looking for a node 

containing some specific text or a number within a certain value range and so 
on, 

A simplified example of how the mechanism of an insertion action may be done 
30 by referencing or 'anchoring' to a sibling node is shown in Figure 5. 
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Figure 5 shows two element nodes defined by <elmt> tags in a source 
document 51 . Closing tags are not shown for the sal^e of simplicity. On 
transformation, the <elmt> tags are annotated with identifiers 1000 and 2000 
5 52, the information in the <elmt> tags are then output into the transformed 
document 53, the source elements re-tagged in <Telmt> tags. In the 
transformed document, a new <Telmt> tag is created in between the existing 
<Telmt> elements 54, and the creation is reflected in the source document by a 
corresponding creation of an new <e[mt> between <elmt:1000> <elmt:2000> 

10 55. The location in the source document In which the new node is created can 
be obtained by referencing the anchor node (e.g. the node having identifier 
1:1000) when sending an update request to the source document agent, or may 
be programmed into the rules of a user Interface to always refer to any other 
particular sibling node; Before re-transformation, the new <elmt> tag Is 

15 annotated with identifier 2500 56, On re-transfomnation, a new node, <Telmt: 
2500> is created in the transformed document , with the identifier 2500, 

Optionally, the developer may set various constraints on the modifications that 
can be perfomied by the source document agent, such as user's access 
20 (read/write) rights to an entire source document or even to particular node(s) in 
the document. Other constraints may come from business logic or XML schema 
of the source document and so on. 

Generally, In order that the transformed documents are annotated with the 
25 correct identifiers in the correct nodes, the transfomiation agents or query 
agents need to be able to transfer the source node Identifiers to the nodes in 
the transfomned documents. Any node in the transfomned document which 
contains editable information from the source document (such as text, elements 
and attributes) should contain the same identifier from its corresponding source 
30 node. 
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As illustrated in the above examples, the underlying mechanism of the . 
embodiments is In the annotation of the nodes in the XML source document(s) 
301, 302, 303 with unique identifiers. The unique source node identifiers can be 
numerals, text strings or both, or any other data types (such as binary format or 
binary representation). The node identifiers must be unique for all the nodes in 
the XML documents that are to be used together in a transformation (preferably 
even globally unique to all existing XML documents). A node's identifier 
preferably persists throughout the existence of the node, and even when 
updating actions performed on neighbouring nodes changes the XML document 
structure. 

Figure 6 illustrates how an XML document may be transformed several times 
and yet retains links to the source document by node identifiers. The final 
Document 7, despite have been though two transfonmations 3, 6 and is a 
combination of Documents 1 , 2, 5, has annotations traceable back to the 
original source Documents 1 , 2, 5. As shown, the node <elmtF:1 :10/> in 
Document 7 shows that it came from document 4, element node <elmtD:1:10/> 
which in turn came from Document 1, element node <elmtA:10/> (the 
highlighted fonts represent the identifiers). 

Similarly, the node <elmf:5:10/> is traceable to node <eImtE:10/> of Document 
5, based on the Identifier '5:1 0'. The name of the elements tags after each 
transfonmatlon does not matter as long as the Identifiers persist through 
transformations. 

Specifically, identifiers are persistent through the following updating actions: 
L when another node which is above, below or In the same level in the 

node hierarchy is modified, inserted or deleted; 
il. when the order of the sibling nodes, nodes in the same branch and at 
the same hierarchy level, is re-arranged. 
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iii. when an attribute, e.g. formatting attributes, of an element Is 
modified, created or deleted from the element. 

Further preferable features 

Preferably, node Identifiers are not recycled. In other words, a newly added 
node should not re-use any Identifiers that have been used by deleted nodes. 

Furthermore, nodes in the transformed document should preferably bear only 
the unique Identifiers of the source nodes and should not have their own unique 
node Identifiers generated. That is, a document.should not contain unique 
identifiers of another source document and different unique Identifiers of itself at 
the same time, or the system may be confused or rendered unnecessarily 
complicated. 

Preferably, the identifier syntax is Implemented in such a way that the 
conventional XML data model or info set model is not changed by the 
annotations, even though each node would now have an associated identifier. 
In other words, the present mechanisms of XML parsing/processing is not 
hampered or disturbed by the annotations, so that standard XML parsers may 
work on annotated XML documents without needing to be modified. 

The source document agent preferably has rollback functions built into its 
design, to enable the update actions on the source documents to be undone 
when requested. The strategies for Implementing rollback functions are well 
known to the man skilled In the art and will not be discussed further. 

In one variation, the source document 40a, after updating becomes a separate 
source document 40b from the original source document 40a, the original 
source document 40a may then be archived. 
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In.a further variation, compression meclianlsms may be used to minimise the 
amount of data communication after re-transfomnations, for example, using 
XDiff to find the differences between the XML documents before and after the 
updates and send only the changed parts to the editing agent for refreshing the 
5 display. 

As described earlier, due to the passing-on (or inheritance) of the identifiers 
between transformed documents during a multi-transformation step, the nodes 
in a final transformed document have the same identifiers as the source nodes. 

10 However, in an alternative embodiment, the system may be designed In such a 
way that transformed documents intermediate in the multi-transformation step 
have identifiers different from other intermediate transformed documents. This 
will require, however, a 'middle-man' database mapping/tracing the identifiers 
from each transformation to another, in order to link the nodes between the 

15 each Intermediate transformed document, the final transformed document and 
the source document. The design can be used to facility debugging by 
identifying at which of the several transformation scripts has a problem been 
caused, when there is any. This also can be used to provide a proxy or firewall 
interface to protect the nodes in the source documents from direct access for 

20 hackers which may use the annotations to trace the data in the source 
document. 

The mechanism described in the embodiments so far relies mainly on three 
major modules or components: the source document agent, the transformation 
25 or query agent, and the editing agent These modules can be implemented in a 
distributed Intranet or Internet environment. 

Furthermore, they can be arranged locally or remotely In a distributed 
environment. Where implemented remotely, the source document agent and the 
30 transformation (or query) agent are grouped together In a source document 
management system. In a 'fat-client arrangement The source document 
management system is responsible for storing the source documents and 



J* 



wo 2005/031498 PCT/SG2003/000235 

24 

transformation scripts, generating the globally unique source node identifiers, 
performing transformations or queries and updating the source documents upon 
requests from the editing agent. The source document management system 
functions as a server in the distributed environment, having a role which is 
5 analogous to a conventional web server combined with the relational database 
management system. In such a remote networl< system, the editing agent, 
which resides on the client side, has a role similar to a conventional web 
browser but has further functions for data editing interactions. As the client does 
not need to support complicated transformations such as source document 

10 updating and management, this an^angement is advantageous for 'thin-client' 
implementations, such as Kloslcs, PDAs and mobile phones. As variations, the 
client and server can both exist as different processes in a same machine, or in 
two different machines across an Intranet or the Internet. All that is needed is an 
implementation of a communication protocol to transmit the transformed 

15 documents and the various action requests and feedbacl< between the source 
management, transformation and editing agents. 

Where implemented in a local system, the source document agent, the. 
transform or query agent and the editing agent all run within a same process on 

20 the same machine. This may be the implementation choice when the source 
documents and transform scripts are meant to be available in one system, or 
when they can be completely or partially downloaded from a server onto a local 
machine for processing. Therefore, the source document can be edited and 
updated completely on the client-side without any intenmlttent server Interaction 

25 and communication, in the case where the XML documents and transformation 
scripts are downloaded Into the client side for complete processing, upon 
completion of the updating or other processing, the client resubmits the source 
documents to the server to completely replace the original source documents. 
The server's role Is therefore simplified to data fetching and to Indicate success 

30 or failure of the substitution of the source documents, while the client performs 
all the processing. The communication between the source agents and the 
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editing agent, since now completely within the client machine, can be simplified 
to using function calls Instead of complicated communication protocols. 

A further Implementation is one which has multi-user concurrent access. This 
mean that the source documents are not required to be locked during editing, 
i.e. a source document can be shared by any number of people, or be used by 
one person working on multiple editing applications. This is possible because 
user editing actions can be broken down into a sequence of 
select/lnsert/delete/update operations on Individual node(s) in the source 
documents, in a way which is very similar to those of SQL statements In a 
relational database. 

Stored Procedures In the Source Document Agent Programmable 

At times, there might .be a need to carry out a sequence of updating actions 
which can be packaged Into a set of stored procedures in the source document 
agent (similar to stored procedures in SQL or macros in MSWord). It is left to 
the developer implementing this invention to design the stored procedure 
language and syntax." An example of the syntax of an action invoking a stored 
procedure In the source document agent is given below: 

invoke foo(param1, param2. 1:1002, 2:2003) on dooument "abcxml" 
using document 1 from "detxml", 2 from "ghLxml" 

The editing agent may Invoke stored procedures by passing parameters. The 
parameters may contain literal value and the relevant unique source node 
Identifier. 

Generally, the well-known CGI (Common Gateway Interface) that supports 
server-side scripting can be utilised to build the request/response mechanism. 
There is no limitation on which language or facility is to be used to Implement 
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the stored procedures. Existing web related languages like Java, ASP, C#, 
PHP, or any future action language defined specifically for XML can be used. 



Identifiers in source documents 

The unique node identifiers will now be described in further detail. 

In a certain embodiment, an identifier may be made up of three parts: an 
Originator ID, a Document ID and a Node ID. The composition of the identifier 
may be varied depending on the preferences of the system programmer, but the 
identifier should generally impart sufficient uniqueness to the nodes. 

The Originator ID Identifies the person or originator who created or owns the ' 

m 

XML document. In a business environment, the Originator ID may be the 
domain name or primary IP address of the company or organisation. For large 
enterprises, the Originator ID may be composed of a hierarchy of IDs, e.g. . 
'Domain ID + Sub-domain ID + Group ID + ... + Server ID + User ID' and so on. 
In an end-user environment, where not every user has a domain name, the 
Originator ID may even be the originator's email address. 

Preferably, the Document ID is a serial number. 

Optionally, a separate database may be created to map Document IDs to the 
file paths or URLs of source documents, by which the source document agent 
may locate the source document for data updating. 

As documents are often moved around in a file system and ported from system 
to system, it is also preferable that the Document ID is designed In a way that It 
is persistent despite file transfers between different system, i.e. the IDs are not 
descriptively tied to the system In which the documents reside. 
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The last component of the Identifier is the Node ID, which is typically a number 
that Is generated incrementally. 

The three IDs combined to give a unique identrfier. However, depending on the 
system designer's preference or the resources available to generate unique 
identifiers, other combinations of identifying components may be used. 

An example of how the identifier syntax Is extended to standard XML tags and 
data, in order to allow the unique source node ID to be persistent in source 
documents, will now be given. 

As a first example, the straightforward way Is to Insert Into each node the entire 
identifier. However, depending on the way an identifier is generated, the 
identifier may be a very long string. In such cases, since the 'Domain ID + 
Document ID' portion of the identifier is identical for all the nodes in each XML 
document, instead of repetitively inserting a long Identifier in each Individual 
node of the same XML document, the ^Domain ID + Document ID' part may be 
declared in a header of the XML document, while only the Node IDs are 
inserted into the nodes. 

* 

The syntax of the header may be in standard XML declaration syntax. For 
• example, If the Originator ID is 'peter@abc.com', and the Document ID is 
'doc1 23456', instead of appending both IDs to each and every node in the 
document, they may be declared In a header In the document: 

Combined E) declaration: <7UmqueDociunentID peter@abc.coiii/docl23456 ?> 
Normal JCML following: <tags> . . .XML tags and data. . . </tags> 

Each node In the document Is then annotated only with Node IDs, which are 
appended to the XML elements, attributes or processing Instructions. The 
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following show nodes in an XML. document having Node IDs (in bold) placed 
after colons: 



10 



15 



20 



Parent node: 
Attribute node 1: 
Child node 2: 
Child node 3: 
Child node 4: 
Child node 5: 
Closing tag: 



<namespace:elmt:1 1 001 

attr:11003= "an attribute with Node ID"> 

<eImt:11002/> 

<?pl:11008 a processing instruction with node identifier ?> 
<1-&11006; a comment with node Identifier -> 
&11005; a text node with node identifier. 
<;/namespace:elmt> 



The XML standard has specified that XML names cannot begin with numerals, 
therefore the syntax in the above-shown parent node and child nodes (1 and 2) 
will not confuse the XML parsers. However, in the above example, an escape 
character, such as an sign is used to indicate an ID for a text. Using this 
symbol for XML data and comments (child nodes 3 and .4). the annotated 
document no longer has valid XML syntax. An XML parser to be used on the 
annotations might have to be modified to recognise and parse the extended 
syntax. 

In an alternative fomi. the annotation perfomned by mapping node positions in 
an XML instruction tags. The XML codes below shows Node IDs in an 
Instruction tag, with a list of Node IDs mapped to child node positions: 



25 Node 1 : 



<?UniqueNodeMapping 2=11001 3=11003 4=11002 
5=11008 6=11006 7=11005 ?> 



Node 2 mapped to 11001: <namespace:elmt 

30 Node 3 mapped to 1 1 003: attr = "an attribute with node ldentifiei'> 

Node 4 mapped to 1 1 002: <elmt/> 

Node 5 mapped to 1 1 008: <?pl a processing instruction with a node identifier 7> 



wo 2005/031498 

Node 6 mapped to 11006: 



PCT/SG2003/000235 

29 

<I- a comment with a node identifier 



Node 7 mapped to 1 1 005: a text with a node Identifier. 
5 </namespace:eImt> 

The first node shown above lists the Node IDs for each of the child nodes in the 
document. However, if a node is inserted or deleted, the entire mapping has to 
be re-generated. An advantage of this type of 'header mapping' is that the entire 

1-0 annotated source document re.malns a valid XML document despite the 

Insertion of the identifiers In the 'header* using the syntax of an XML processing 
instruction. Therefore, existing XML parser and other processors can be used 
on the source documents without modification. However, such processors 
should only be granted read-only access, but not write access to the source 

15 documents (for example by making the file 'read-only'), as they might remove 
the first node containing the important node-Identifier mapping information. 

In the two types of annotation. steps described above, the former scheme 
requires the XML parser/application to understand the annotations, so there Is 

20 no risk of external applications messing up the identifiers Inadvertently in some 
operations. As the annotation is localised, I.e. inserted directly into each node, 
non-DOM base XML parsers, like SAX, can be Implemented which allow parts 
of the XML tags to be skipped during parsing. In the latter scheme, the syntax of 
an XML tag that Is normally used to contain processing instructions is used to 

25 contain the header mapping. In this case, existing XML parsers and processors 
can be used (for read-only access) without modification. 

Identifiers in transfornied documents 

30 Variations of the annotation in transformed documents are similar to those 
described above in source documents. 
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For a transformed document which is a combination of several source 
documents, the example below shows how the several Document IDs in the 
transformed document may be declared using prefixes, in a similar way to how 
namespaces are declared and referred to in standard XML syntax: 

<?UnlqueSourceDocumentlD peter@abc.com/doc1 23456 as 1 ?> 
<?UniqueSourceDocumentlD tom@abc.com/doc1 1 1 1 1 1 1 as 2 ?> 

The declaration, shows the IDs of two source documents, which were used to 
form a transformed document, having alias of 'V and '2'. During the annotation 
step before transfomriation, the source document agent assigns one of the two 
aliases, '1 * or *2'. to the nodes of each respective document- Each node in the 
transformed document may therefore contain an alias instead of an entire 
Source ID, linking each node in the transformed document correctly to one of 
the source documents. 

As shown above, the name of the processing instruction Is preferably 

♦ 

UniqueSourceDocumentlD instead of UniqueDocumentID, to emphasise to the 
programmer that the Identifiers Indicated in the tag is of source document(s). 

An example of identifier syntax that annotate each individual node Is given 
below: 



Example of transformed document 



XHTML node transformed from document 1: 
Attribute that Is not bind to any source node: 
XHTML node transformed from document 2: 
XHTML text node extracted from document 1 : 



<div:1:11001 
attr 

<img:2:11002/> 
&1:1100S; text node 
</div> 



In an annotation scheme corresponding to the one given earlier for source 
document annotation, an annotated 'header tag', such as an XML processing 
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Instructional node, may be used to nnap child nodes, by declaring a mapping 
between nodes and the Node IDs: 

XML processing instruction node: <?UniqueSourceNodeMappIng 2==1:11001 

4=2:11002 6=1:11005 ?> 



Transformed node 2 from document 1 : <div 
Transformed node 3 without binding: attr =". . 

Transformed node 4 from document 2: <lmg/> 
Transformed text node 5 from document 2: text node 

</div> 



Preferably, the method Is Implemented in such a way that there is no need to 
amend the structure of the transformation script (e.g, XSLT or XQuery) despite 
the Identifiers. The transformation/query agent (or engine) is simply modified to 
perform extra steps to transfer the source node identifiers during a- 
transfomnatjon/query into the transformed document. 

In general,- there are two main types of transfonmation actions which must be 
accompanied by a transfer of Identifiers: 

1 . Steps or processes that require establishing a cun-ent node. For example, if 
the transformation process includes a loop operation or an application of a 
template, the transformation engine must able to identify the (reference) 
node from which the operation begun, i.e. the innermost loop in a nested 
loop, so that reference may be made to that node when looping or applying 
template. 

2. Steps that output data from source nodes into the transformed document. 
Such data may be subjected to modification by the viewer, and the nodes 
containing the data must therefore be traceable to the source document. 
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Editing agent 

The updating mechanism behind the editing agent 220 the will now be 
elaborated upon. 

5 

An updating request is sent by the editing agent via a communication standard 
to the source document agent. It is left to the developer implementing the 
method to define ttie communication standard and to choose the 
communication protocol. For example, the communication might be a 
10 SOAP/XML protocol that is built on TCP/IP, a remote procedure call or normal 
fiinctioh call if the source document agent and editing agent both reside in the 
same system. 

Generally, an update request should contain three specific pieces of 
15 Information: 

1. The action to perform, e-g. to insert a node, delete a node, or update a node; 

2. The node to modify: i.e. using the unique sourpe node identifier of tlie node 
to be modified; 

3. The value to use: if the action is ah update action, then the new value has to 
20 be supplied. 

Editing Interfaces (i.e. user interface) can be broadly classified Into two 
categories: console-based, rich-formatted (in other words WYSWYG). 

25 In a command console type interface, the editing agent presents the 

transformed document to the user, and provides a command prompt (like MS- 
DOS prompt) for the user to type in commands. Such a crude Interface is very 
easy to Implement, very robust and flexible but has limited user friendliness. 
IVIany existing shell scripts such as MSDOS on Windows or other scripting 

30 programs on UNIX, can be used to send console commands. 

An example of how the syntax of action commands may loolc is shown below: 
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delete node 1002 ftom document "xyzjosl" 
update node 1003 to "HeUo Worid" from docummt "xyz.xml" 
copy node 1004 from "abcxml" before node 1001 from"def.xml" 
move node 1004 from "abcxml" into node 1001 from "defxml" 

Referring now to rich-fomiatted editing interface, most XIVIL display appiioations 
for advance presentation such as marl<up presentation scripts lii<e XHTIVIL, 
XSL, SVG. SIVIIL and iVIatlilVlL are meant to be used for displaying the data and 
not for user interactive data editing. 



In one embodiment of the editing agent, the XiVlL transformed document may 
contain a new tag element defined to implement user-defined interactivity, such 
as an <listener> element. The <listenei^ element may be used to contain and 
invoke coded routines, such as responding to mouse dicl<s or other events. The 
<listener> element may also be attached to or nested inside presentation 
elements in XHTIVIL, SVG. etc, in order to provide interactive capabilities to 
othenwise static presentations (W3C's XML Events specification can be 
referenced to design such event handling elements). 

One limitation of the source annotation described so far is that the annotation 
pert'ormed by the source document agent is an 'have-all' or 'have-none' type, 
i.e. source annotation is perfonned on all of source nodes or not performed at 
all. In a user-rights controlled editing interiiace, the XML document owner may 
want to 'lock' certain parts of the same document, so that viewers can view 
those parts of the document without being able to edit them. In this case the 
XiVIL document administrator author may 'switch off the source annotation 
functions in the source document agent, or 'switch off* the identifier transfer 
functions in the transformation agent, on the read-only parts of the document. 
As the resultant transfomned document contains no Identifiers In the read-only 
parts, there is no way the editing agent can send effective editing requests to 
the source document agent on the un-annotated nodes. Only those nodes in the 
transformed document that has node identifiers support editing actions. Data 



wo 2005/031498 PCT/SG2003/000235 

34 

residing In nodes without identifiers are displayed as 'read-only' and cannot be 
modified since they have no identifiers linking them to the source document. 
The editing agent may be embedded in a web browser such as internet 
Explorer or Netscape. Alternatively. It may be a proprietary user-interactive 
display programmed by the developer Implementing the system. Generally, the 
display must allow the viewer who has read/write access rights to see and 
Interact with the data In the transfomned document underiying the presentation 
(such as through fomns and text boxes in a browser and so on), and to trigger • 
the editing agent to.activate the update requests. Any Interactive web 
techniques may be used to facilitate the viewer interaction, for example. Java 
Applets. The specific presentation format may be defined by a stylesheet, as Is 
common for marked-up language documents. 

One variation of the selective node annotation described above Is where the 
selection is done by the transfomnation/query agent. A fully annotated data 
source may not have all Its identifiers transferred to the transfonned version of 
the data by reason of some criteria in the transfomrtatlon/query script. Therefore 
the transfomned document has parts which are read only when displayed in a 
user Interface, or by the editing agent. 

Three pieces of infomiation that have to be supplied by the editing agent/GUI to 
the source document agent in an update command are: 

• Identifier of the target node, I.e. the innermost unique source node 
identifier in the parent or ancestor nodes. This is the node targeted for the 
update. 

• Update action or command 

• Parameter values that are necessary for the update. If the event is a mouse 
click, the parameters might be the x, y coordinates relative to the visual 
display, the number of clicks, the button clicked (left, middle, right). If it is a 
key event, the parameters might be the key code, repeat count, etc. 
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The editing agent may be programmed to maintain editing states after 
transfonnation. For example, after the source document has been edited and 
the data is updated and displayed to the user, the focus of the display is 
returned to the very node/data that had the focus before the updating command 
issued. In other words, the editing agent has state persistency. The editing 
agent achieves this by recording the identifiers of the nodes and their display 
states, such as 'selected' or 'focused-. Therefore, no matter how the nodes In 
the result document are updated, rearranged, or restructured, the editing agent 
maintain state persistency in the display. For example, re-focusing the screen 
cursor onto the node which last had the focus, or re-positioning a scrollbar of 
which position is defined by a node as before an updating action and re- 
transformation (provided that is it has not been deleted). The persistent state 
infomriation can be sent bacl< to the XML document agent at the end of the 
editing session and retrieve at the beginning of the next editing session or, 
alternatively, it can be retained in the editing agent. 

An e)®mple of serialisation of state using identifiers is shown in the following 
using mock syntax, with reference to the XML document further below: 

Serialisation code: 

node 1:1001: IsSelected, 

node 1:1003: IsFocused, caretPostion=2 

node 1:1004: scrollbarPositlon=1025 

Annotated XML document: 

<?UniqueSourceDocumentlD www.bn.om/doc123456 as 1 ?> 
<?UniqueSourceDocumentlD www,amazon,om/doc1 23456 as 2 ?> 
<html> 
<body> 

<table border=1 cellspacing=0> 
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<tr> <td>Tltle<;/td> <td>Prlce from Amazon<Ad> <td>Price from BN</td> 
</tr> 

<tr:1:1001> 

<td>TCP/IP lllustrated</td> 

<td>65.95<Ad> 

<td bgcolor==#ccocff>&1:1003;65.95</td> 
<Ar> 

<tn1:1007> 

<td>Data on the Web</td> 
<td>34.95</td> 

<td bgGolor=#ccccff>&1.:1009;34,60</td> 
</tr> ■ 
</table> 
</body> 
</html> 

Transformation script nmodification 

A further variation of the embodiment is in using another transformation script(s) 
to transform and modlly transformatidn scripts currently used on a source XIVIL . 
. document. As XSLT is itself an XML document (and XQuery, while not written in 
XIVIL syntax, has an equivalent syntax I.e. XQueryX), transformation scripts can, 
therefore, be transforrned in the same manner as source documents. The 
transformation scripts may even be modified 'on thefl/, i.e. during an editing 
session on the source documents. This is possible because this Invention does 
not require the transformation scripts to remain the same during the editing 
process. Figure 7 illustrates the concept. 

Referring to Figure 7, source document A is transformed by transform script B . 
into the resultant transformed document D. If the viewer prefers that 
transformation script B is modified to have different selection criteria, he may 
activate transfomiation script C to transform transformation script B into 
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transformation script B\ In such a case, the transformation by the modified 
script B' of document A will produce transformed document D'. 

Some other variations 

Another embodiment of the Invention Is one which generates source node 
Identifiers that are persistent and unique for only one editing session. Instead of 
being made up of Originator ID + Document ID + Node ID. the identifier Is made 
up of only Document ID + NodelD. The Originator ID is dropped. 

<?UniqueSourceDocumentID "c:\doc1 23-xml" as 1 ?> 
<?UniqueSourceDocumentlD "c:\doc234.xmI" as 2 ?> 
<?UniqueSourceNodeMapplng 1=1:11001 3=2:11002 4=1:11005 ?> 
<divattr ="../> 

<img/> 
■ • ■ > 

</div> 

In this case, the source node Identifiers are only intended to be used for one 
editing session arid are discarded when the editing session is over. As a result, 
the source documents can be ported to and from different computers without 
any restriction, as they do not use system names or addresses as part of their 
IDs. This can be considered as a scaled-down embodiment of the Invention. 
The advantage of this scaled down embodiment Is that the Identifiers are 

« 

simpler to generate and maintain during an editing session. 

Although the examples given are addressed In particular to XML documents, 
these are not restricted In application to only XML documents and can be used 
on any type of well-defined hierarchical node-based data structure, or marlced- 
up language documents. Furthermore, many non-XIVIL data sources, like HTML, 
RTF documents can be firstly mapped into an XIVIL equivalent format before 

« 

editing and then converted bacl< to its original fornnat after editing. The method 
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can also be applied to many other data sources like file system, directory 
system, Windows registry, relational and object database. 

Although this description uses XSLT and XQuery as examples of 
transformation/query scripts, it Is not the intention of the inventor to restrict the 
use of the teaching with them. There are many other transformation languages 
with which the mechanisms described herein can be used. 

Furthermore, although the temns 'transformation' and 'query' are usually strictly 
used with different meanings, referring to the processes of XSLT and XQuery 
respectively, It is the intention of this specification that they both 
interchangeably refer to any processes performed on a node-based data source 
to obtain a set resultant data derived from the data source. 
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Appendix: 

The following example is written based on the use case, "1.1 Use Case "XMP": 
Experiences and Examples", from the specification "XML Query Use Cases" at 
httD://www. w3c,ora/TR/xq u e rv-u s e-cas as from W3C. The example is used to 
illustrate how an XML document is transformed and how the changes made to 
the transformed document are updated In the source document by the method 
disclosed in the description of this specification. Two sets of mock XML data 
below show the prices of some books from two different bookstores, BN and 
Amazon. 
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Sample Transformation and Query with auto source annotation 



The following are examples of XSLT and XQuery scripts that generate a 
5 combined transformed document from the earlier two XML book price 

documents. After a transfomnation by either of the scripts, the prices of books 
from bn.com and amazon.com are selected from the XML documents. The 
result is formatted in HTML, displaying the titles and prices of books from the 
two bookstores, 

10 

The query expressions/actions which, Is executed to create the transformed 
document, need to be followed by annotation/Identifiers transfers from the 
source documents to the respective nodes in the transfonned document. The 
parts of the codes in the scripts which trigger a transfer of Wentifiers are 
15 highlighted in bold, for example, $b and $a. 
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And the resultant HTML page rendered in an HTML browser or editor should 
look like: 



TtHe 


Price from 
Amazon 


Price from 
BIT 


TCP/IP Illustrated 


65.95 


65.95 


Advanced ProgratmxuQg in. the TJtux 

environment 


65,95 


65.95 


Data on die Web 


34.95 


39.95 



By the embodiments disclosed in the description, a user having read/write 
access to the data can interact with the rendered HTML document through the 
editing agent and undertake editing actions by, for example, sending a scripted 
command through a command console, or through an advance user interface. 
For example, the user might want to delete the second book and lower the price 
of the last book. 





Pric^from 
Amazon 


Price from 

EN* ; 


TCP/IP Uliistrated 


65.95 


65.95 • 


Advanced Programmine itt4ieJIrttS 

environment -p-.--^"^"^ 


^^^^^^ 


65.95 


Data on the Web 


34.95 i 


39.95/ 34,6& ; 



The request sent to the source agents might, for example, be scripted in a 
syntax like this through a command console such a MSWindows' command 
prompt: 

> Ho www.amazon.com (as the entire row ±a hound to 
2:2004) : 

delete docl23456/2004 
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> To www.J3n.com (as the text ±a bovoad to 1:1009): 
update docl23456/l009 to "34.60" 

5 

« 

In the above example, the user can update both source documents at the same 
time, as all the XML source documents are annotated and the transformation 
carry over all the Node IDs. This type of non-selective annotation method is 
10 termed 'auto source annotation', as all nodes are automatically annotated 
regardless of any criterion. 

In a more realistic scenario, a manager from BN might want to review the prices 
of Its books in comparison to those of its competitor Amazon. In this case, he is 
15 allowed to have read-only rights to the data from Amazon, but Is allowed to 
have read/write rights to the data from BN. The transform or query language 
therefore needs to be extended to have user selectivity. The following is an 
example of an extended XSLT or XQuery script with author-specified source 
annotation, termed 'author specific source annotation' in this specification. 

20 

In the XQuery script, [$bj, which represents data from the bn.com XI^L 
document, the brackets indicates to the transformation engine that source node 
identifiers should be inherited by the corresponding nodes in the transformed 
document, so that editing by the viewer can take place. The information which is 
25 to be extracted from amazon.com is not to be annotated with the identifiers from 
the source document, and Is scripted without a bracket, i.e. $a, so that the 
transformation engine will not transfer the identifiers. 

In the equivalent XSLT script, the line of code which extracts data from the 
30 bn.com XML document is indicated with an source-annotate="y©s", indicating 
to the transformation engine that source node identifiers should be passed on to 
the corresponding nodes in the transformed document. The information which is 
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to be extracted from amazon.com is not to be annotated with the identifiers from 

« 

the source document, and is scripted with souroe-annotate="no". 
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In the foregoing script examples, the editable fields will have a coloured 
background in the HTML display: <td bgcolor=:#ccccf f > 

The transfomned or query result with source annotation should look that shown 
below, where only the bn.com data is annotated with Node IDs, 

<?TtoiqueSourceDoctiitteiitID www.bn.om/docl23456 as 1 ?> 

<?Un±gueSpurceDocuinentID www.azuazon.om/doGl23456 as 2 ?> 
<html> 

<body> 

<table border^l ceHspaGingaO> 
<tr> 

<td>Title</td> <td>Price from Amazon</td> <td>Price 
from B2sr</td> 
</tr> 

■ 

<tr : 1 : 10 pi> <-(whole row is 

annotated) 

<td>TCP/IP lllustrated</td> 

<td>65 . 95</td> ^(cell with 

Amazon data) 

<td bgcolor=»#ccccf f >&1 : 1003 ; 65 . 95</td> ^(cell With BN 

data) 

</t:r> 

<tr:l:1004> 

<td>Advanced Programming in the Unix 
envi r onment < / 1 d> 

<td>65.95</td> 

<td bgcolor=#ccccff>fi:l:100 6;65.95</td> <-(ceII with BN 

data) 

</tr> 
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<trsl:1007> 

<td>Data on the Web</td> 

« 

<td>34.95</td> 

5 <td bgcoror=#GGCGf f>&l:1009;39.95</td> ^(cell .with BN 

data) 

</tr> 
</table> 
</body> 
10 </html> 



The result in an HTML browser or editor should lool^ lll<e: 



— 

Title 


Price frost 

Amazon 


Price from 

BIT* 


TCP/ri» Illiurtrafted 


65. 95 






Advanced Prograzioning in the Mtdx 
envixonavent 


65. 95 




Data on 'the Veb 


34. 95 \ 


S 





coloxired 
background 
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If the bn.com manager wants to carry out the action of changing the bn.com 
price of the book "Data On The Web", the requests to the source document 
agent might have a syntax, without reference to the amazon.com nodes, which 
looks like: 

To www. hn. com: 

delete docl23456/1004 

update docl23456/1009 to "34.60" 

In general, auto-source annotation (i.e. annotating all nodes) will work in simple 
cases, and gives backward compatible support for existing XSLT and XQuery 
scripts because the extra brackets such as [ ] for selective annotation are not 
needed. Whereas author-specific source annotation will give the viewer finer 
control over what is editable in the result .document. 

As described briefly in the description section of this specification, presentation 
markup scripts like XHTML can be extended to allow author to furiiher specify 
more editing functions, for example, by a self-defined <listener> tag, or by other 
Interactive scripts like DHTML etc. As an example, the above XSLT script is 
shown below, further modified to include a self-defined <nstener> tag, by which 
pre-deflned updating commands can be send to the source document agent 
This avoids having to use the command console to type in update requests, and 
thus making updating easier through, for example, by mouse clicks on web 
browsers or other editing interface. 
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When the user double click within the table row with the identifier 1 004 tagged 
with the <Hstener> tag, the editing agent may be programmed to construct the 
following deletion action script (Analogously, If a console-based interface is 
used, the same script will be typed In by the viewer to be sent to the source ' 
document agent) : 

To www.hn.com: delete docl23456/l004; 

When the user type over the price within the cell with the Identifier 1009, the 
editing agent is able to construct the update request, in response to 'Icey-down' 
action on the keyboard, as: 

■ 

• 

To ww;/\r . bn . com : update docl23456/l009 to "34.60" 
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The updated result HTML rendered In some HTML browser or editor should 
look like: 



Title 

• 


Price from Amazon 


Price from BNl 


TCP/IP Illustrated 


65.96 






Data on the Web 


34.95 







Inserting new data 

The following sample code shows the capability of copying new data obtained 
from another source document into a first source document. The transformed 
dopument lists all books and their prices from the bn.com source document, and 
also shows the corresponding Amazon prices. The Amazon list has a price 
entry missing for the book 'The Economics of Technology and Content for 
Digital TV". 

Assuming now that the viewer is a manager from Amazon.com without write 
access to bn.com data (not the bn.com manager anymore, as in the example 
given earlier), the procedure below shows how the above-mentioned source 
documents from tooth bn.com and amazon.com are transformed and combined, 
. and how a data insert (or more accurately according to this illustration, a data 
copy action) may be effected by using the node Identifiers. In this example, the 
bn.com nodes are now not annotated on transformation, while the amazon.com 
nodes are. . 
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The rendered result in an HTML browser or editor should look like: 



TCP/IP Illustrated 



Advanced Progranming in the Onix erwiromidnt 



Data on the Veb 



Price f rom BHiPrice from Amazon 



8ri»~ni iimirem 



The Economics of Technology and Content for Digital IV 




When the amazon.com manager double-clicks on the 6mpty cell, the editing 
agent is triggered by the code copyPriceQ in the <listener> tag to send to the 
source document agent a command to copy the price from the amazon.com to 
the targeted source node where the cdpyPriceO command is nested in, for ■ 
example, . 

■ 

Tp www.amazon.com: 

' invoke copyPrice(www.bn.com/doc1 23456/1:1 010) on 
www.amazon.orn/doc123456 

As shown in the XML document for bn.com above, 1:1010 is the identifier for 
the parent element and the child nodes: 



<book:10J.O year="1999"> 

<t:itle>&1011;The Economics of Technology and Content 

for Digital TV 

</title> 

<pxiblisher>Kluwer Academic Publishere</publisher> 
<price>&1012 ; 129 . 95</price> 
</book> 
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The stored procedure or mechanism contained in the 'copyPrice()' command 
may be scripted for the insertion might look like this: 

copyPrice ($sourceNodeID) { 
insert 

* 

<entry> 

<title>{sourceNodeFromID ($sourceNodeID) /title/tex 

to} 

</title> 

<price>{sourceNodeFromID ($sourceNodeID) /price/tex 

to} 

</price> 
<review> 
<review/> 

</entry> 

into current node; //i.e. node www.aiaazon.02Q/docl23456 

• ' } 

The above stored procedure shows an example of how a command text, which 
is triggered by the double-clicking, Inserts the tags such as <entry>, <title>, 
<price> and <review> to enclose the new data, such that the structure is 
consistent with the sibling nodes. Again, the actual strategy and implementation 
. the insert mechanism and how it Is to work is up to the developer. The purpose 
of the example Is to show that source annotation gives the possibility of very 
specific position Identification. 
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The updated source document in amazon.com should then look like: 



< ?Unl queDocumezi t XD www . amaz on • ccxa/do c 12 3456 ?> 
<revlews : 2 0 0 0> 
<ent:ry:2001> 

<title>Si2002;Data on the Web</title> 
<price>Si2003 ; 34 . 95</price> 

<review>A very good discussion . /. </review> 
</entry> 

■ 

<entry:2004> 

<title>fic2005;Advanced Programming in the Unix 
envi r onment 

</title> 

<price>&2006;65 .9S</price> 

<review>A clear and detailed discussion of UNIX 
programming • 

</review> 
</ entry > 

<entry :2007> 

<title>&2008; TCP/IP Illustrated</title> 
<price>&2009 ; 65 . 95</price> 

<review>One of the best books on TCP/IP. </review> 
</entry> 

<entry> 

<title>The Economics of Technology and Content 
for Digital TV 

</title> 

<price >129 , 95 </price> //copied from www.bn.com 
< r evi e w> < / r evi e w> 
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</entry> 
</reviews> 

* 

Note that the updated source document from amazon.com now has a new price 
"1 29.95" for the book the Economic of Technology and Content for Digital TV. 
. the new node is not annotated until the node goes through a cycle of 
transformation. 

It can be envisaged by the skilled man that, as mentioned in the description, a 
reference may be made to the identifiers of the sibling cells to specifying the 
position in the amazon.com source document into which the new data should 
be copied. 

The examples show how the annotation system allows data manipulation in an 
XML document, such that command procedures and routines may be designed 
to work on the documents to the same effect that query languages are being 
used to modify tables in other types of database. 



